
# Run regressions

### Hand files:
* `config.yaml` -- This is the main driver of this task. Here you define all of the regressions you want to run (using the parameters described below).
* `outcome_sets.yaml`
* `covariate_sets.yaml`
* `sample_sets.yaml`
* `subgroups.yaml`

### Source files
* `src/run_regressions.R` -- Main driver of the code. Runs each regression defined in the config.yaml in sequence, parallelizing over different specifications (i.e. combinations of outcome, covariate set, subgroup, etc.)
* `R/regression_functions.R` -- Functions references in the "regression_functions" parameter for the config.yaml. There are high-level things like `run_itt`, `run_joint_baseline`, `run_covid_regressions`, and `run_fwer_resampling_itt`. 
* `R/generic_helper_functions.R` -- Helper functions that don't just apply to one specific regression task. These functions range in scope from broad regression setup (e.g. `load_analysis_file` and `get_covariates`) to specific and unit-testable (e.g. `run_lm`, `get_tot_ccm`, `winz`).
* `R/pathway_reweighted_helper_functions.R` -- Helper functions specific to the pathway-reweighted regressions
* `R/covid_helper_functions.R` -- Helper functions specific to the covid regressions

Note: this task also relies on a few functions in `project_functions.R` (e.g. `get_stars` and `get_outcomes`)

### Setting up the config.yaml

Some parameters are required for ever regression:
* `regression_type`: determines what data gets run in and which lm function is used for ITTs and TOTs (Options: "xsection" or "panel")
* `sample`: determines which rows to run the regression on (Options: "full" or a key in the sample_sets.yaml file)
* `outcome_set_list`: determines which yvars are run (Options: key in the outcome_sets.yaml file)
* `covariate_set_list`: determines which xvars are run (Options: keys in the covariate_sets.yaml file, either individually or separated by "__")
* `regression_functions`: list of function to run (Options: functions defined in `R/regression_functions.R`)

Some parameters are optional because default values are defined:
* `subgroup_list` (Default: [])
* `window_cut_list` (Default: the `window_cut_list` defined in the project config)
* `robust` (Default: true)
* `subgroup_ftest` (Default: false)
* `append_window_to_yvar` (Default: true)
* `append_window_to_takeup` (Default: true)
* `pre_processing_functions` (Default: [])
* `post_processing_functions` (Default: [])
* `skip_basic_descriptive_stats` (Default: false)

Note, some parameters only make sense for certain regressions. Any parameter that you need in a function can be defined in the config.yaml under a particular regression. In the code, these can be accessed via `more_params$<my_custom_parameter>`. For example, only in the FWER resampling regressions do we define and use `n_samples`. And only in the pathway reweighted regressions do we define and use `cr_phat_upper_bound__ul` and `cr_phat_upper_bound__re`.

